MENTAL TASK CLASSIFICATION BASED ON ENTROPY, 
SPECTRAL ENTROPY AND MUTUAL INFORMATION 



F. Mamashli , A. M. Nasrabadi , SR ShotdM 

Department of Biomedical Engineering, Poly Technique Univei:5ity, Tehiran, Iran 

Department of Biomedical Engineering, Shahed University, Tehiran, Iran 

f_mamashliy@yahoo.com 



Abstract— with recent advances in signal processing and 
biomedical instrumentation; EEG signals can be used as a 
new communication channel between human and 
computer. Implementation of this channel is possible 
through recording and analyzing brain waves. Such a 
system translates human thoughts for a computer thus it is 
called a brain computer interface (BCI). There are lots of 
ways for feature extraction. Shannon's communication's 
theory may be used to quantify the information of BCI 
data. In this paper, we aim to evaluate the performances of 
different features extracted from the EEG signal which 
suits best for this purpose. We use Fourier coefficients, 
entropy, spectral entropy and mutual information as 
features to a linear discriminant analysis (IDA) classifier 
for the classification of 3 different mental tasks mainly: 
baseline, rotation and multiplication. We compare their 
results on 4 different subjects taken from the Anderson's 
database. Results show that classification using 
combination of entropy, spectral entropy and mutual 
information as features outperform all the other features 
used for classification. 

l&\Morefe— Brain computer interface. Entropy, Spectral 
entropy. Mutual information. Linear classification 

I. INTRODUCTION 

Although the first research relating Id BCF s speared 
in the 1960's, it still in its infancy for a variety of 
MstDric reasons. Recent changes in technology and 
advances in research have changed the envirDnment for 
BCI research. Recent advances include enabling ovA 
monkeys to control a rubotic arm and a computer cursor 
through fhar thought a alone in 2000, and enabling a 
paralyzed 25-year old man to do the same in 2004. 
The basic model of a BCI system is shown in figure 1. 
This diagram will be the basis of the rest of this section, 
describing the brain signals, signal acquisition, signal 
feature extraction, signal feature translation, and some 
possible outputs. 

BCIs are c^Dable of monitoring various brainwave 
phenomena Exanples of brainwave phenomena include 
slow cortical potentials (SCPs), P300 potentials (positive 
peaks after 300ms), and rm or beta rhythms recorded 
from the scalp. Methods to observe ftese phenomena 
include EEG, magnetoencephalogr^hy (MEG) , 
positron emission topography (PET) , functional 
magnetic resonance imaging (fMRI), and optional 
imaging. Other signal features that are considered to 
contaminating the user intent are electromyography 



(MEG), and electroocuolography (EOG), which result 
from muscle and eye movement respectively. There are 
also invasive BCI recording methods (e.g. implanted 
electrodes). 

Most BCIs use EEG signals, which represent the 
electrical activity in the brain as measured from outside 
of the skull [1]. EEGs are normally not only providing 
insight concerning important characteristics of the brain 
activity but also yield clues regarding the underlying 
associated neural dynamics. The processing of 
information by the brain is reflected in dynamical 
changes in this electrical activity. The ensuing activity 
variations are found in (i) time, (ii) frequency, and (iii) 
space. The EEG -signal is what mathematicians call a 
non stationary time- series (ST). Powerful analytical 
methods have been developed over the years to extract 
information from ST [2]. They can be characterized as 
time-domain or frequency- domain (or both). This 
information should to help the BCI system to distinguish 
the parts of the signal that encode the user's intent. In 
addition to spatial filtering, a variety of options for 
feature extraction are currently under study by others, 
including spatial and temporal filtering techniques, 
signal averaging, voltage amplitude measurements, and 
spectral analysis (e.g. by using the Fast Fourier 
transform)! 1]. 

Entropy and spectral entropy has been mostly used as 
features in different fields such as Neonatal Seizure 
Detection and speech recognition [15]. It is a measure of 
disorganization or uncertainty in a random variable. The 
information can be interpreted as essentially the negative 
of the entropy, and the negative logarithm of its 
probability [10]. 

Feature extraction based on entropy and spectral 
entropy will improve accuracy in classification results 
due to feature reduction from 30 to 6 in Fourier 
coefficient rather than entropy and spectral entropy. 
The commonly used EEG features for this purpose 
include the use of Fourier coefficients, entropy, spectral 
entropy (SE) and mutual information. Entropy is usually 
used in the context of pattern classification and 
information technology. Originally the entropy was 
defined for information sources by Shannon [9]. 

Mathematical Background 

In 1948, Claude Shannon defined the information theory 



PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006© 



Signal 
Acquisition 



Digitized 
"Signal 



4 



Signal Processing 



Feature 
Extraction 



Translation 



f 



^^^ 



Comnands 



y 



jHHB^ 



BCI Application 



Fig. 1. BCI Procedure 



concept of entropy which he described as the mean or 
the expected value of the information, where entropy 
measures the average uncertainty or the average self- 
information conveyed by an event [4]. The self- 
information represents the information that is conveyed 
by an event that occurs with probability p(x). Since self- 
information measures uncertainty, event that are 
uncertain contain a lot of information, therefore, have 
higher entropy. Therefore, the higher the Entropy, the 
greater the uncertainty as to which event will occur. The 
mathematical representation of entropy is found as 
follows: If we let x be a discrete random variable, the 
entropy is denoted as: 



H(x) = -Xp(01og2P(0 (1) 



1=1 

Where Ex is the expectation with respect to x, i(x) is 
the self- information and p(x) is the discrete probability 
distribution function (PDF). The discrete PDF is 
estimated by dividing x into N different amplitude bins 
and determining how many values of x are in each bin 
and normalizing by the number of values. Any logarithm 
base may be used for the equation as long as the base is 
maintained. In the equation above, "base 2" is used: 
therefore, the entropy is calculated in terms of bits. 
Shannon's definition has been applied, modified, and 
proven valid in a variety of fields. In 1979, Powell and 
Percival introduced the concept of spectral entropy (SE), 
based on the peaks of the Fourier spectrum, as a measure 
of regularity [6]. Inouye claimed that the spectral 
entropy is a useful means for observing the degree of 
EEG irregularity [7]. To find the SE, first the discrete- 
time form of the spectrum is found from: 

1 



p{k) = -^TX{k) 



NT 

N-\ 



(2) 



X(k) = Y,x[n]eKpi-j27ik/N) (3) 

n=0 

WhereX (k) the discrete Fourier transforms (PDF), T is 
the sampling interval and N is the number of samples in 
the data set. The SE then is expressed as [3]: 

H=-Y,P{k)log,P(k) 



The thdid featuie in our study was mutual infonnation 
which has beea widdy used in feature selection and 
Alzhamer's disease study. The expected mutual 
information is a measure of dependence between two 
dimensions, i.e., to what extent events tend to co- occur 
in particular combinations. In this respect it is 
comparable with the product- moment correlation 
coefficient in the way entropy is comparable to the 
variance. Mutual information is given by: 

m n I D 



/=i j=i 



V 



p. ' Pj 



J 



Sometimes also denoted by M(X,Y) or T(X,Y). It can 
be shown that J(X,y)>0 and that 

J(X,y) = H(y)-H,(y) and 

J(X,Y)=H(X)-H, (X) (Theil 1972: 125-131). It can 

further be derived that the multi- dimensional entropy 
equals the sum of marginal entropies minus the mutual 
information (Theil 1972: 126): 

H{XJ)=H{X) + H{Y)-J{XJ) 



[16]. 

A. The Data Set 



II. METHOD 



The data set used in this study was obtained from the 
Colorado University. The data have been recorded 
according to the 10-20 standards from the C3, C4, P3, 
P4, 01, 02 electrodes. Each recorded signal has a length 
of 10 seconds with a sampling rate of 250 Hz. Seven 
subjects participated in the recordings, but we only used 
the signals of the first, third, fifth and the sixth subjects 
because of their larger number of trials [17]. The mental 
tasks considered in the data collection includes: baseline, 
mental multiplication and mental 3-D geometric rotation. 
Our goal is to classify mental tasks from each other 
using mentioned features and compare their results. The 
FOG signal which was recorded simultaneous to the 
EEG signals served to remove artifacts caused by eye 
movements. 

B.Data Processing 

Initially, using the FOG signals and an appropriate 
empirical threshold value, sections of the EEG signals 
coincident with the eye movements are identified and 
removed from the data set which leads to signals free of 
artifacts from eye movements. The resulting signals 
were then divided into windows of 2 seconds length each 
with an overlap of 1 second. 

C. Extracting Features 

Feature extraction is a process focused on discovering 
a pattern that can differentiate various classes. Usually 



PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006© 



two approaches are used to extract the features from 
EEG signals. The first approach is based on the 
characteristic P300 signal following the occurrence of an 
event, referred to as event related potentials (ERP). 
Since this approach relies on a device that interacts 
between a subject and the stimuli corresponding to the 
event. It is impractical in our study. Hence, we employ 
the second approach [8]: 

As previously mentioned we used several methods for 
extracting useful features from the EEG signal. They are 
described as follows: 

1) Spectrum analysis: the power spectral density 

of signals is used as features in this application [8]. 
The PSD of the clean EEG signals is integrated over 5 
frequency bands: 0-3. 5Hz (delta), 4-7Hz (theta), 8- 
13Hz (alpha), 14-34Hz (beta) and Gamma> 35Hz 
[11]. Previous studies show that these frequency bands 
change characteristics while performing mental tasks 
[12], [13] and one of the best ways of detecting these 
changes is using their power spectral density [14]. 

2) Entropy (E): As previously discussed Entropy is a 
measure of the amount of disorder in a signal. Another 
way of looking at is that entropy is how evenly spread 
a set of numbers is. An example of the entropy 
calculated for a trial of C3 EEG channel in three tasks 
is shown in fig. 2 a). This figure indicates that entropy 
level of EEG signals changes in different tasks but it 
has low variation in different channels. It means that 
the mean value of entropy is almost constant in 6 
channels. 

3) Spectral entropy (SE): The feature was computed 
over each sliding window segment using the 
mathematical technique described in Section 1 with 
125 point (1 second) of overlap. In our algorithm, we 
calculated the EFT as a fast means of obtaining the 
DFT. The SE was computed for all channels over each 
segment [3]. An example of the spectral entropy is 
shown in fig. 2 b). 

4) Mutual information (MI): Mutual information is 
widely used, in a descriptive way, to measure the 
stochastic dependence of categorical random 
variables. In our study we have estimated the mutual 
information between each segmentation of one EEG 
channel with the whole segmentation of the other 
channels. Therefore the corresponding feature vector 
consist 15 elements. 



baseline 



baseline 




40 50 

multiplication 



10 



20 



30 40 50 60 

rotation 



70 80 90 





4.5 



10 20 30 40 50 60 70 80 90 

rotation 



5.4 r 




10 20 30 40 50 60 70 80 90 

b) 

Fig. 2. EEG pattern in three different tasks: 
a) Entropy b) Spectral entropy 

D .Classification 

The extracted f eatunB were fed intD a linear discriminant 

analysis classifier (LDA) using a probability model 
describing the discriminant functLon and supposing dass 

distribution of /^ (x) and prior probability of tt^ .Where 

xis the observation of dimension q: 



P (class = / 1 x) = 



fM)^. 



E/.(^K 



10 20 30 40 50 60 70 80 90 



a) 



It can be shown that the nle that maximizes the 
conditional probability shown above will give the 
smallest nunio: of nisdassifications which is known as 
Bayes' nle. If we further assume that the classes have a 

Gaussian distribution with mean //^ and covariance Z 

then maximizing the conditional probability is 
equivalent to finding the zth class which maximizes Li. 

L; = x^S"V/ -J^i^'^J^i /2 + log;r; 

We ^ply maximum likelihood estimation for 

calculating// , we arrive at a linear discriminant analysis 

[18]. 

111. RESULTS 

In the simulations three classes have been classified 
with a feature vector of 6 dimensions for entropy and 
spectral entropy, 15 for mutual information and 30 for 
frequency features. The classification results from the 
LDA classifier are shown in Table 1. As it can be 
observed from the fourth row of this table, when we 
combine all of the entropy, spectral entropy and mutual 
information features as the input to our classifier with a 
total dimension of 27, the classification results yield 
much better. The reason behind the poor classifications 
results from subjects 3 and 5 might be from not to have 
attention. Hence, their respective EEG signals differ 
substantially with those of subjects 1 and 6. 

IV. CONCLUSION 



PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006© 



The purpose of this study was to introduce new 
features based on entropy and evaluate their performance 
as features for a B CI. We evaluated the performances of 
different features, mainly: entropy, spectral entropy, and 
mutual information and Fourier coefficients as inputs to 
a LDA classifier for the classification of three mental 
tasks performed by 4 subjects. Although the best 
classification results were obtained by the combination 
of entropy, spectral entropy and mutual information but 
results based solely using entropy features were 
promising as well. 

Table 1 

Classification results with test data using different features 





Siibl 


SiiW 


SiiW 


Sub6 


Entropy 
(E) 


90.48 + 12.5 


56.58 + 39.1 


62.38 + 27 


79.18 + 30 


Spectral 
EnLiupy(S 


94.09 + 9.7 


62.14 + 29 


65.79 + 25 


80.72 + 24 


Mutual 
Informatio 


84.23 + 18.5 


57.57 + 30 


54.61+29 


84.51 + 16 


E+SE+MI 


96.58 + 4.4 


66.74 + 35 


71.16 + 23 


92.5 + 12 


Fourier 
coeff. 


95.62 + 4.9 


61.85 + 35 


66.18+29 


91.52 + 8.8 



Table 2 





Classification results with test data Using different features 




Subl 


Sub3 


Siib5 


Sub6 


Entropy 


91.98 ±.9 


61.3±4.7 


65.32±1.8 


81.68±1.7 


Spectral 
entropy 


94.3±.5 


66.79 ±3.7 


67.18+1.6 


82.21±2.8 


Mutual 
information 


89.2 ±1.5 


64.86 ±3.08 


60.42 ±3.2 


88.58±1.99 


E+SE-fMU 


98.3±.4 


78.06±3.6 


78.39±1.8 


97.23 ±.7 


Fourier 
coeff 


97.45 ±.32 


75.48 ±4.13 


79.46 ±3.2 


96.33 ±.56 



channel C4 



100 



-100 



50 



-50 I 



^"^'^^^^lllIP^^^^^^^^ 



500 



1000^ ,^1500 

channel C3 



2000 



2500 



\0\^W 



m 



500 



1000, , 1500 
channel P3 



2000 



2500 



100 




2500 



Fig 3. EEG signal from subject 3 in the second trial baseline 

REFERENCES 

[1] J. Thorpe, P.C. vanOoischot Asomaygi "Pass- 
thoughts: Autheaticating with our rdnds "Digital 
security Group, School of Computer, Carlton Univei:sity, 
April 18, 2005. 



[2] A. Plastino, O A. Rosso^Entropy and statistical 

conplexity in brain adivity^europhysics news 

Magazine, pp. 224, 2005. 

[3] M.M. DAlessandrD, G. J Vachtsevanos, R Esbeller, 

J. Echauz, A. Koblasz, B. Litt "Spedial Entropy and 

neural involvement in patients with Mesial temporBl lobe 

epilepsy" MErMBS'2000, Las Vegas, Nevada, June 26- 

29,2000. 

[4] W. G^pmair, "Claude E. Shannon: The 50th 

Annivei:5ary of information Theory,"IEEE 

Communications Magazine, vol. 37, no. 4, pp. 102-105, 

1999. 

[5] C. E Shannon, Claude Elwood Shannon: collected 

p^Deis. New York: IEEE Press, 1993. 

[6] RQ. Quiroga, Quantitative Analysis of EEG 

Signals Time Frequency Methods and Chaos Theory. 

PhD. Dissertation, Medical Univei:sity of Lubeck, 1998. 

[7] T. Inouye et al.., "Quantification of EEG 

irregularity ty use of the entropy of the power 

spectrurn," Hectroencephalogr^hy, vol. 79, pp.204r210, 

1991. 

[8] T. Lan, A. Adard, D. Erdogmus, M. 

Pavel"Estimating cognitive state using EEG signals" 

urpublished 

[9] C.E. Shannon, "A mathematical theory of 

communication," Bell system Technical Journal, vol. 

27,pp. 379-423, 623-656, July, Oct. 1948. 

[10] A. M. Toh, R Togneri, S. Nordholm, "Spectral 

entropy as speech features for speech rBcognitiori ' 

urpublished 

[11] Emst Nredermeyer, Fernando Lopes da SUva," 

Electroencephalogr^hy: basic principles, dirdcal 

^plications rdated fields", urpublished 

[12] Erol Basar, CananBasar-Eroglu, Sird Karakas, 

Martin Shurman, "Gamma, alpha, ddta and fheta 

oscillation govern cognitive processes". International 

journal of psychophysiology 39, 2001. 

[13] G. Pfurtschdler, F. R Lopes da SUva, "Event 

rdated EEG/MEG synchronization and 

desynchronization: basic prindples". Clinical 

neurophysiology 110, 1999. 

[14] S. S0II300, M. R Moradi, "Mental task 

rBCogrdtion: a comparison between some of 

dassification methods", BIOSIGNAL 2004. 

[15] S. Fad, G. Boylan, S. Connolly, W. Mamane, G. 

Lighfbody,"A novd Automatic Neonatal SdzurB 

Detection System"ISSC, 2005. 

[16] K. Freiken" Entropy statistics and information 

theory" The Elgar Companion to Neo- Schunpeterian 

Econonics July 2003. 

[17] A. R Omidvamia, F. Atry, S. K. Setardidan, B. N 

ArBbi"Kalman filter parBmetei:5 as a new EEG feature 

vector for BCI Application" Univei:sity of Tehran, Iran 

[18] F. AbdoUahi, A. M. NasrBbadi,"Con±)inationof 

Frequency Bands in EEG for Feature Reduction", 

EMBC IEEE, New Youri^ 2005. 



PROG. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006© 



